Pseudorehearsal in actor-critic agents
نویسندگان
چکیده
—Catastrophic forgetting has a serious impact in reinforcement learning, as the data distribution is generally sparse and non-stationary over time. The purpose of this study is to investigate whether pseudorehearsal can increase performance of an actor-critic agent with neural-network based policy selection and function approximation in a pole balancing task and compare different pseudorehearsal approaches. We expect that pseudorehearsal assists learning even in such very simple problems, given proper initialization of the rehearsal parameters.
منابع مشابه
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decre...
متن کاملAdversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning
This paper presents a new method — adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in taskcompletion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the ...
متن کاملAccelerate Learning Processes by Avoiding Inappropriate Rules in Transfer Learning for Actor - Critic
hh tt tt Abstruct—This paper aims to accelerate processes of actor-critic method, which is one of major reinforcement learning algorithms, by a transfer learning. In general, reinforcement learning is used to solve optimization problems. Learning agents acquire a policy to accomplish the target task autonomously. To solve the problems, agents require long learning processes for trial and error....
متن کاملAn Actor/Critic Algorithm that is Equivalent to Q-Learning
We prove the convergence of an actor/critic algorithm that is equivalent to Q-learning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using c...
متن کاملG Uide a Ctor - C Ritic for C Ontinuous C Ontrol
Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GA...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.04912 شماره
صفحات -
تاریخ انتشار 2017